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Abstract: Suppose that a random n-bit number V is multiplied by an odd constant AI > 3, by adding 
shifted versions of the number V corresponding to the Is in the binary representation of the constant M. 
Suppose further that the additions are performed by carry-save adders until the number of summands is 
reduced to two, at which time the final addition is performed by a carry-propagate adder. We show that 
in this situation the distribution of the length of the longest carry-propagation chain in the final addition is 
the same (up to terms tending to as n — > oo) as when two independent n-bit numbers are added, and in 
particular the mean and variance are the same (again up to terms tending to 0). This result applies to all 
possible orders of performing the carry-save additions. 



1. Introduction 

Let X and Y be random n-bit integers that are independent and uniformly distributed in [0, 2" — 1]. If 
they are added in the usual way, starting at their rightmost end and proceeding to the left, there may be 
various "carry-propagation chains" . A carry-propagation chain is a sequence of fc > 1 consecutive positions 
in the binary representations of X and Y in which the rightmost position generates a carry (because both 
X and Y contain Is in these positions), and the remaining fc — 1 positions to the left propagate this carry 
(because exactly one of X and Y contains a 1 in each of these positions). Let the random variable C„ 
denote the length of the longest carry-propagation chain. (Note that the longest carry-propagation chain 
is not necessarily the longest sequence of consecutive carries: the addition of the binary numbers 0101 and 
1111 gives rise to two carry-propagation chains, each of length two, not to one of length four.) The length 
of the longest carry-propagation chain is of interest because it governs the execution of certain parallel 
implementations of addition (see Glaus [C] and Knuth [K]). 

The distribution of C„ has been investigated since the early days of electronic computing. The investi- 
gation was begun in the famous report of Burks, Goldstein and von Neumann [B] in 1946, where it was shown 
that Ex(C„) < log2 n + 1. The next step was taken by Glaus [G], who showed that Ex(C„) > logj n — 2. 
Knuth [K] showed that 

-n/2'=+i , ^ ( (logn) 



Pr(C„ > fc) = 1 - e-"/^ +Q(^ ^ j (1.1) 

(where the constant in the 0-term is independent of fc as well as n), and that this implies 

Ex(C„) = log2 n + 7log2 e - ^ - $(log2 n) + , (1-2) 

where 7 = 0.5772 ... is Euler's constant, e — 2.718 ... is the base of natural logarithms, and $(i^) is a periodic 
function of u with period 1 and average (that is, /o ^{v) dv = 0) satisfying |$(j/)| < 1.573 . . . x 10-^ for 
all u G [0, 1). Pippenger [P] gave an elementary derivation of (1.1), and showed that it also implies 

Var(C„) = ^(log2 ef + ^+iJ + *(log2 u)+0 f MZ^") , (1.3) 
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where tt = 3.14159... is the circular ratio, ui = 1.2374... x 10~^^ is a constant, and '^{i') is a periodic 
function of v with period 1 and average satisfying |\E'(i^)| < 5.3573 . . . x 10~^ for all v € [0, 1). 

In Section 2 we shall present a new analysis of the addition problem that yields results similar to those 
above, though with weaker error bounds. Specifically, we shall show that 

Pr(C„>fc) = l-e-"/^^^^+o(^). (1.4) 



This implies 



and 



3 /flogrO^\ 
Ex(Cn) = log2 n + 7 log2 e--- $(log2 n) + Q T J^^ J (1.5) 

Var(C„) = ^(log2 er + ^+oj + ^{\og, v)+0 f ^i^) (1.6) 
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in the same way that (1.1) implies (1.2) and (1.3). The weaker error bounds are a result of our choice 
to present our new argument in its simplest form; these bounds could be improved by elaboration of the 
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argument (but, as Knuth [K] points out, so could those of (1.1-3)). Our motivation, however, for presenting 
this new analysis is that it can be extended to obtain the results claimed in the abstract, which we shall now 

describe in more detail. 

We shall investigate the length of the longest carry propagation chain that occurs when a random n-bit 
integer V, uniformly distributed in [0, 2" — 1], is multiplied by a fixed constant M. The simplest case of our 
problem is M = 3. In this case, the product Z = M ■ V is obtained by adding V to the number 2V that 
is obtained by shifting V one position to the left. The two random numbers being added in this case are 
not independent, but Izsak [I] has shown that the length of the longest carry-propagation chain nevertheless 
satisfies the estimate (1.1). More generally, we may consider the case M = 2^^ + 1 (where d > 1), for which 
the product Z = M ■ V is obtained by adding V to the number 2'' V that is obtained by shifting V to the 
left d positions. Izsak [I] has shown that again the estimate (1.1) applies (where now the constant in the 
O-term may depend on d, but not on k ov n). 

We shall consider a further generalization in which M has two or more Is in its binary representation. 
Suppose that the binary representation of M is M = X^o<i<d "^i ('^ith mj G {0, 1}) and that c (where 
2 < c < d + 1) of the digits mo, mi, ... , m^ are Is (so that the remaining d+1 — c are Os). We may assume 
without loss of generality that m^ = 1 (since otherwise we could reduce the value of d) and that mo = 1 
(since the carries that occur when multiplying by 2M will just be shifted versions of those that occur when 
multiplying by M). Let si — < S2 < ■ • ■ < Sc = d he the positions of the 1-bits, so M = J2i<i<c ^'-'^ 
1 < i < c, let Wi = 2** V be obtained by shifting V to the left s, positions. The product Z = M -V will be 
obtained by adding these c numbers: Z = X^i<j<g Wj. 

When c = 3, we can form the sum Z = Wi + W2 + W3 in two stages as follows. The first stage will 
perform a "carry-save addition", which takes the three numbers Wi, W2 and VF3 and inputs and produces 
as outputs two numbers X and Y having the same sum: X + Y ~ Wi + W2 + W3. There are of course 
many pairs of numbers X and Y that satisfy this condition. The details of carry-save addition, including the 
specification of the numbers X and Y that will be produced, will be given later. For now we merely observe 
that in carry-save addition, all carries propagate one position to the left, and in a parallel implementation, 
all carries propagate simultaneously, so that a carry-save addition contributes a fixed delay to the parallel 
execution time. Thus our analysis will not deal with carries in this stage. The second stage will perform a 
conventional "carry-propagate addition" to obtain the final product Z as the sum of X and Y . This addition 
is analogous to those considered in previous paragraphs, and it is the carry-propagation chains in this stage 
that will be the focus of our analysis. We will obtain the estimate (1.4). 

When c > 4, we can use c— 2 carry-save additions to reduce the c numbers Wi,W2, ■ ■ ■ Wc to two numbers 
X and Y in the first stage, then add these two numbers with a carry-propagate addition in the second stage 
to obtain Z as before. In this case, however, there is an additional complication: there is more than one way 
to use c — 2 carry-save additions to reduce c numbers to two mimbers. At one extreme, one can sum Wi, 
W2 and with the first carry-save addition, then proceed similarly with the resulting (c — 3) + 2 = c — 1 
numbers, and so forth. The numbers X and Y are thus obtained after c — 2 carry-save additions, each (except 
for the first) of which depends for at least one of its inputs on the its predecessor, so that these carry-save 
additions contribute c — 2 fixed delays to the parallel execution time. At the other extreme, one can use 
[c/3j carry-save additions in parallel to combine 3 [c/3j numbers, producing 2 [c/3j numbers having the 
same sum, then proceed similarly with the resulting (c — 3 [c/3j)-|-2 [c/3j = c— [c/3j numbers, and so forth. 
As Wallace [W] has observed, these c — 2 carry-save additions contribute only log3/2 c + 0(1) fixed delays to 
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the parallel execution time. Our result, which is that the estimate (1.4) again holds for the carry-propagate 
addition in the second stage, applies equally to all of the ways of performing the carry-save addition in the 

first stage. 

All of our results reinforce one point: the randomness in one uniformly distributed number V is sufficient 
to produce the distribution (1.4); the full power of the independence of X and Y in forming their sum is not 
needed. In Section 3, we shall give a specification at the bit level of the algorithms that were specified above 
at the level of operations on numbers, and describe the feattires, common to all these algorithms, that will 
be used in the subsequent analysis. In Section 4, we shall give the proof of (1.4) based, on these common 
features. 



2. A New^ Analysis of Addition 

In this section, we shall prove (1.4) for the addition of two independent random numbers. The analyses 
of Knuth [K] and Pippenger [P] of (1.1) proceed by deriving a recurrence for the probability that the 
addition of two random n-bit numbers yields a carry propagation chain of length at least k, then solving 
this recurrence for the asymptotic behavior of this probability. Our new analysis is based on the observation 
that the main term 1 — e""/^*"^^ in (1.1) and (1.4) is the probability that a Poisson-distributed random 
variable with mean n/2'^+^ has value at least one. There are approximately n (actually n — k + 1) places 
at which a carry-propagation chain of length k can occur, and the probability that such a chain occurs at a 
given place is 1/2*^+^. If all these possible occurrences were independent, we could derive the desired result 
from the Poisson approximation to binomial distribution. They are not independent, but the efi^ects of their 
dependence can be analyzed far enough to yield the estimate (1.4). (This analysis is an application of the 
"Poisson paradigm" described by Alon and Spencer [A].) 

A set of k consecutive bit positions will be called a k-block. There are n — k + 1 distinct fc-blocks. A 
fc-block will be said to be active if its rightmost position generates a carry and each of the remaining k — 1 
positions propagates a carry. The event "C„ > k" is clearly equivalent to the event "there is at least one 
active fc-block", which we shall denote To estimate Pr[i5„_fe], we shall use the following principles. 

(A-1) The probability that a given fc-block is active is 1/2*^+^. 

(A-2) If a set of fc-blocks includes two that overlap, then they cannot all be active. If no two overlap, 
then the events of their being active are independent. 

We shall show that (1.4) follows from these two principles. Let 

fci = [21og2nl. (2.1) 

For k > ki, we have FT[En,k] < {n-k + l)/2'=+^ = 0{l/n) by (A-1) and Markov's inequality. We also have 
1 — e"/^*^^ = 0(l/n) by the power series = 1 + 0{x), valid for a; ^ 0. Thus we have (1.4) for k > ki. 

For k < ki, we shall estimate Pr[£^„,fe] using inclusion-exclusion, using (A-1) and (A-2). We have 

p,i.„.]=i-i:("-*-^')(^)\ (...) 
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since there are just (" •'\ ') ways to choose j non-overlapping A;-blocks in the n bit-positions. Let 



ko 



l0g2 



3n 



21ogn — 61oglogn 



so that 



1 n 2 

-logn-loglogn < < -logn- 21oglogn, 



o'=o+i „ /logn 



and 



-n/2'=o+i ^ ^ 



„n/2''o + i ^ ^ 



„l/3 
„2/3 



(logn)2 

We shall begin by assuming k > ko (as well as fc < fci). Let 



io= [(2eV3)logn]. 



We shall break the sum in (2.2) at jo: 



0<j<jo 



n-j{k^l)\ ( -1 



2k+l 



E 

i>io 



n-j{k - 1) 
j 



2fe+i 



(2.3) 

(2.4) 
(2.5) 

(2.6) 
(2.7) 

(2.8) 



We bound the magnitude of the second sum in (2.8) by using ("-^(^'=-1)) < (p < {en/jY, which yields 



E 

j>jo 



n-j{k-l)\ ( -1 



2/c+i 



< 



< 



E/ en 



= o 



1 



,2eV3 / ' 



(2.9) 



using (2.4) and (2.7). 



For the first sum in (2.8), wc use (2.1) and (2.7) toestimate the binomial coefficient by (" -'''^ ^■') = 



(nVj!)(l + 0{jk/n)y = (nVj!)(l + 0((logn)3/n)): 



E 

0<j<jo 



n-j{k-l)\ ( -I 



J 



2k+i 



-1 ) E 

0<i<io 

The presence of the O-torm in the summand prevents us from exploiting cancellation after moving the sum 
inside the O-term, so to obtain an error bound for the resulting sum we consider the magnitudes of the 
summands: 



/n-j(fc-i)\ i-iy _ ( ^ l_ 

I j j 2(fc+i)j I ^ 7! 

0<j<jo ^ / \0<3<3o 



! V2('=+i) 



E 1I 



\0<j<jo ■ 



^ 1 / -n 



vO<j<jo ' 



O 



O 



O 



(logn)'^ 



E j\ (2(^+1)) I 

0<j<30 •' / 



logn 



7,1/3 
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using (2.6). Extending the sum from j < jo to j < oo yields 

'logn 



o<i<io - ■ ' • ■ ■ ' 

We bound the magnitude of this sum just as we did that of the second sum in (2.8), to obtain 

2^ \ ) 2('=+i)J V n2eV3 \ nV3 



0<j<jo ^ ^ 



= e-"/^''^^+oteV (2.10) 



VnV3 

Substituting (2.9) and (2.10) in (2.28), we obtain (1.4) for ko < k < ki. 

Finally, we consider k < ko. We use the fact that PT[En^k] is a non-increasing function of k, so that 

1 > PrIiS,,] > P4i.„,.l = 1 - + O (^) = I + O (^) , 

using (2.5). This yields (1.4) for the remaining values of k. 



3. The Algorithm for Multiplication 

In this section we shall describe in more detail the algorithm presented in the Introduction. It will be 
most convenient to describe this algorithm in the language of hardware, implemented as circuits built from 
gates interconnected by wires, but this is of course equivalent to a description in the language of software 
for a parallel computer, such as that used by Glaus [C] and Knuth [K]. 

We assume that M is given its unique conventional binary representation M = X^o<j<d"^J which 
all digits mo = l,m,i, . . . ,md = 1 are either or 1. As before, let si = < S2 <■■•< Sc = d denote 
the positions of the Is. Our first step will be to specify the encodings of the numbers Wi, W2, . . . , Wc as 
sequences of bits. The input V = J2o<i<n-i ^^^^ received using n bits vo,vi,. . . , Vn-i as usual. Since 

V is an n-bit number (in the range [0, 2" — 1]) and M is a (rf-|- l)-bit number (in the range [0,2<^+i-l]), their 
product Z = M-V isa.n{n + d+ l)-bit number (in the range [O, (2" - l)(2'^+i - 1)] C [0, 2"+'*+i - 1]). Thus 
it will suffice to represent all numbers produced during the execution of the algorithms (the output Z and all 
intermediate results) using n + d+1 bits, and to perform all additions (both carry-save and carry- propagate) 
modulo 2"+'^+^. Thus wc shall represent each Wi (for 1 < i < c) by the n + d + \ bits in its conventional 
binary representation: Wi — X]o<i<n+d Since Wi = 2** V, we have Wi^i = vi-s- if Sj < Z < n — 1 + Sj, 
and Wi^i = if < / < Si — 1 or n + Sj < Z < n + d. In what follows, we use the terms left and right as 
usual for the conventional binary representation: position is the rightmost position and position n + d\s 
the leftmost. 

In the first stage of the algorithm, we reduce the c summands Wi , W2 , . . . , Wc to two summands X and 

Y by means of carry-save adders. Each carry-save adder consists of n + + 1 "full adders", one for each 
position in the numbers being added. A full adder is a pair of gates that takes three input signals (say /, g 
and h) and produces two output signals. The sum output is the parity (that is, the sum f ® g®h modulo 
2) of the three inputs. The carry output is the majority ((/ A 5) V (/ A /i) V (5 A h)) of the three inputs. 
The parity and majority are symmetric functions of the three inputs, so when specifying what signals should 
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be fed into a full adder, we do not need to specify which signal goes into which input. The n + d + 1 full 
adders in a carry-save adder reduce three summands (say F = 'Yl,o<i<n+d fi"^^ ^ ^ ~ Tli0<i<n+d9i'^^ ^^"^ 
H = J2o<i<n+d 2') to two summands (say A = J2o<i<n+d 2' and B = J2o<i<n+d 2') as follows. The 
signals /(, gi and hi are fed into the inputs of the full adder in position I (for < I < n + d). The sum outputs 
of the full adders become the bits of the summand A: ai = pa,nty{fi,gi,hi) fov < I < n + d. Finally, the 
carry outputs of the full adders become, after being shifted left one position, the bits of the summand B: 
bi+i = majority(/;, (/;, /i;) iorO<l<n + d—l (the carry output from the full adder in the leftmost position 
is ignored) and 6o = (a bit is shifted into the rightmost position of B). 

After the c summands Wi , W2 , ■ ■ ■ , Wc have been reduced to two summands X and F by c — 2 full adders 
in the first stage, the smnmands X and Y are added by a carry-propagate adder in the second stage. Like 
a carry-save adder, a carry-propagate adder can be built from n + d+1 full adders, one for each position in 
the numbers being added. Two of the inputs of the full adder in position I (for < I < n + d) are provided 
by the appropriate bits xi and yi of the numbers X = J2o<i<n+d ^' ^ ~ T^o<i<n+d ■ 
case the third input of the full adder in position I is fed from the carry output of the full adder in position 
I — 1 for 1 < Z < n + d, and is fed the constant for I = (the carry output from the full adder in position 
n -|- d is ignored). The n + d+1 bits of the final product Z are then produced at the sum outputs of the full 
adders. 

This description of a carry-propagate adder gives an adequate picture of the production of the outputs, 
but it is not convenient for the analysis of the longest carry propagation chain, for which we must distinguish 
between between the generation of carries and their propagation, rather than merely their production. To 
make the generation and propagation of carries more explicit, we may replace the full adders in the second 
stage by "half adders" . A half adder is obtained from a full adder by substituting the constant for one 
of its three inputs. The resulting device consists of a pair of gates, one of which computes the sum output 
as the parity (that is, the "exclusive-OR" ) of the two remaining inputs, and the other of which computes 
the carry output as the conjunction (that is, the "AND") of the inputs. If we replace each full adder in the 
second stage with a half adder, then the carry output of each half adder will indicate whether a carry is 
generated at that position (that is, whether both xi and yi are Is for that value of I), and the sum output 
will indicate whether a carry would be propagated by that position (that is, whether exactly one of Xi and 
yi is a 1). 

4. The Analysis of Multiplication 

We begin by deriving the principles, analogous to (A-1) and (A-2), that will allow us to analyze multipli- 
cation. A k-block is a sequence of contiguous bit positions among the n + d+1 positions of numbers modulo 
2n+d+i_ Thus there are just n + d — k + 2 distinct fc-blocks, with the rightmost position of the rightmost 
fc-block being position 0, and the rightmost position of the leftmost fc-block being position n + d— k+1. We 
shall say that a fc-block is active if, in the final addition in the second stage, its rightmost position generates 
a carry and its remaining fc — 1 positions propagate a carry. Whether or not a fc-block is active depends on 
the input bits not only in its k positions, but also in up to d positions to its right. These d or fewer positions 
will be called the extension of the fc-block, and the A;-block together with its extension will be called an 
extended fc-block. (The d rightmost fc-blocks will have fewer than d positions in their extensions, since there 
are fewer than d positions to their right.) 
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The inputs to the final addition are computed by circuits composed of three-input parity and majority 

gates and zero-input constant gates. Furthermore, constant gates occur only in the circuits computing the 
rightmost d and leftmost d+1 positions (positions through d — 1 and positions n through n + d). A fc-block 
will be called marginal if it or its extension overlap the rightmost d or leftmost d+1 positions. Thus there 
are 3d+l marginal fc-blocks. A fc-block will be called central if it is not marginal. 

(M-1) The probability that a central fc-block is active is 1/2*^+^. 

Suppose the rightmost position of the fc-block is position / {2d < I < n — k + 1). For the rightmost 
position to generate a carry, the values of both xi and yi must be 1. The value of yi depends on the inputs 

. . . , Vi-d, and it is computed from them by a circuit composed of three-input parity and majority gates. 
These gates compute self-dual Boolean functions: if the arguments of a self-dual function are complemented, 
then the value of the function is also complemented. The class of self-dual functions is closed under compo- 
sition, so yi is a self-dual function of the inputs . . . , If the arguments of a self-dual function are 
independent unbiassed bits, then the value of the function is also an unbiassed bit. Thus the probability 
that yi = 1 is 1/2. The value of x; depends on the input vi as well as the d inputs to its right, and we have 

Xi =Vi®(t){vi_i,...,Vi_d), 

where 4> is some rf-adic Boolean function. Since vi is an unbiassed bit independent of vi^i, . . . ,vi-d, xi is an 
unbiassed bit independent of yi. Thus the probability that position I generates a carry is 1/4. 

For each of the remaining fc — 1 positions of the fc-block to propagate a carry, we must have xi+i (Byi+j = 1 
for 1 < j < fc — 1. As between xi+i and yi+j = 1, only xj+i depends on vi+j and, as above, we have 

xi+j = vi+j e (i){vi+j_i, vi+j-d). 

Thus each Xj is an unbiassed bit independent of the bits to its right, so the probability that each of the 
remaining fc — 1 bits propagates a carry is 1/2^^"^, and the probability that a central fc-block is active is 
(l/4)(l/2'=-i) = 1/2'=+!. 

(M-2) The probability that a marginal fc-block is active is at most 2^'^/2'^. 

The analysis of (M-1) applies to the fc — 2d or more positions of the fc-block that do not overlap the 
rightmost 2d or leftmost d + I positions. 

We shall say that two fc-blocks are strongly non- overlapping if they, together with their extensions, are 
non-overlapping, and that they are weakly overlapping if they are non-overlapping, but one overlaps the 
extension of the other. 

(M-3) If two fc-blocks are overlapping, they cannot both be active. 

This holds because at each position, generating a carry and propagating a carry are exclusive events. 

(M-4) If a fc-block B lies to the right of, and is strongly non-overlapping, a fc-block A then the event that 
B is active is independent of the event that A is active. 

This holds because the activities of strongly non-overlapping fc-blocks depend on disjoint sets of inputs. 
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(M-5) If a fc-block B overlaps the extension of a fc-block A, but does not overlap A itself, then the 
probability that B is active, given that A is active, is at most 2*^/2*+^ . 

The analysis of (M-1) applies to the k — d or: more rightmost positions of B that do not overlap A or 
its extension. 

Wc shall show that (1.4) follows from these five principles. As before, we define ki by (2.1). Then (1.4) 
follows for k > ki, since from (M-1), (M-2) and Markov's inequality, Pr(£'„jt) is 0(l/n), as is 1 — e~"/^ * . 

For k < fci, we again define feg by (2.3), and begin by assuming that k > ko (as well as A: < fci). Using 
(M-2) and Markov's inequality, the probability that any marginal /s-block is active is at most {3d +1)2'^'^/ 2'' ~ 
0{logn/n). Thus we may ignore marginal fc-blocks, and turn our attention to estimating the probability of 
the event E'^ that some central fc-block is active. For this, we shall again use inclusion-exclusion: 

Pr(ii;;,,) = ^ J2 (-l)^-^Pr(i?i,...,B,- all active), (4.1) 

j>l Bu...,Bj 

where the sum is over all lists {Bi, . . . ,Bj) of j central fc-blocks, with S^+i to the right of Bi for 1 < i < j — 1. 
By (M-3), we may also assume that Bi, . . . , Bj arc pairwisc non-overlapping. 

We shall partition the contributions to the double sum in (4.1) into two parts, 

Pr(K,J = Si + Su, 

where Si denotes the sum of the contributions from lists Bi,...,Bj that arc pairwise strongly non- 
overlapping, and Sii denotes the sum of the contributions from lists Bi,. . . ,Bj for which at least one 
pair Bi, Bj+i of successive fc-blocks is weakly overlapping. The contributions to Si will be dealt with in 
ways that are completely analogous to those in the analysis of addition. For the contributions to Sii, we shall 
need to analyze the effects of weak overlaps, but in this case it will suffice to consider only the magnitudes 
of the contributions, without making any attempt to exploit cancellations. 

For Sj, the only difference from the analysis of addition is that now the extended fc-blocks each have 
length k + d, and the number of positions into which j of them must fit is now n — 2d+l. Thus the binomial 
coefficient that counts the number of ways that j strongly non-overlapping central fc-blocks can be chosen is 
^(n-2d+i)-j{k+d-i)y gjj^pg ^j^jg quantity still satisfies the estimates 

{n-2d+l)-j{k + d-l)\ ^ nP_ (^^gfi^ogny 



J J j! 



for j < jo, where jo is again defined by (2.7), and 



{n-2d+l)- j{k + d-l)\ W . / en 



J J j^- \ 3 

for all j, we can use (M-1), (M-3) and (M-4) in the analysis of Section 2 to show that 
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Turning to Sii, we abandon any attempt to exploit cancellation among the terms, and merely sum 
bounds on their magnitudes. We have 



< 



^ fin-2d+l)-{j -l){k + d-l)+g\ f 1 ' 



i>ij>i+i^ ^ i</i,...,/,<d ^ ■' / \ / \ , 



where g = /i + ■ • • + Here I denotes he number of values oi i {I < i < j — I) such that Sj+i over- 
laps the extension of Bt, the binomial coefficient ("'7^) counts the number of ways in which these val- 
ues of i may be chosen, the parameters fi, . ■ . ,fi denote the amounts of overlap, the binomial coefficient 
^{n-2d+i)-{j-l){k+d-i)+g^ counts the number of ways in which the j — I fc-blocks or weakly overlapping 
sequences of fc-blocks may be chosen, the factor (l/2'^+^)-'^' denotes the probability, following (M-1) and 
(M-4), that the j — I fc-blocks that do not overlap the extension of a fc-block to their left arc all active, and 
the factor (2''/2'^+-^)' bounds the probability, following (M-5), that the remaining I fc-blocks are all active. 
Since the innermost sum has at most terms, each with g < Id, the innermost binomial coefficient is at 
most (^."j) < n^~''/{j — 0' and we obtain 



1>1 \ ^ j>l+l ^ / \J ) 



;>i ^ ' m>l 



where we have made the substitution m = j — I. We shall show below that 



m>l 



I J m 

for a; > 1 and I > 1. Using (2.4) and (2.6) we obtain 

|Sii|<E 
l> 

= 



-<(4.)V (4.2) 



1 



„l/3 

for all sufficiently large n and k > ko. It remains to prove (4.2). We have 

I )^~ndJ^ to! 

m>l ^ ' m>l 

~ l\ dx^ ^ to! 

m>l 



0<s<l 
0<s</-l ^ / ^ 



-ll — S 
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Since x> 1, we have J' ^ ^ < t" ^. Using the further inequaUties s\ <l\ and 

o<s<;-i ^ ^ ^ ^ o<s<; \ / \ / 




we obtain (4.2). This completes the proof of (1.4) for ko < k < ki. 

Finally, we must consider k < kg. Again as in the analysis of addition, the fact that Pt[E'^^ j,] is a 
non-increasing function of k, together with the bound (1.4) for k = ko yields (1.4) for the remaining values 
of k. 

5. Conclusion 

In this paper we have shown that the distribution of the length of the longest carry propagation chain 
can be analyzed using what Alon and Spencer [A] have called the "Poisson paradigm" . We have also show 
that this method of analysis can be used to show that a particular algorithm for multiplication of a random 
integer by a fixed constant has, to within terms tending to zero as n — * cxd, the same distribution for the 
length of the longest carry chain in the final addition. This algorithm is characterized by shifting over zeros 
in the multiplier, and by the use of a carry-save adder to incorporate the contributions for all but the last 
two non-zero digits of the multiplier. We should point out that our analysis does not appear to be applicable 
to either of two natural variants of this algorithm: one in which zeros are not shifted over, but cause a 
contribution of zero to be added using a carry-save adder (for in this case we cannot appeal to self-duality in 
the computation of the final summands), and one in which a carry-propagate adder is used for all additions 
(in which case it does not matter whether or not zeros are shifted over, for in this case the outputs of each 
adder depend on an unbounded number of input bits to their right). It remains an open question whether 
the result of this paper applies to either or both of these variants. 

An apparently even more challenging problem is to determine whether or not the result of this paper 
applies to the algorithm considered here when the multiplier is not a fixed integer, but is rather a random 
integer with the same distribution as, but independent of, the multiplicand. This question has been studied 
empirically for the variants described above by Gilchrist, Pomerene and Wong [G] (for the use of a carry- 
propagate adder for each addition), and by Estrin, Gilchrist and Pomerene [E] (for the use of a carry-save 
adder). In each case, the answer is apparently affirmative. 
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